4 research outputs found

    Clickstream Data Analysis: A Clustering Approach Based on Mixture Hidden Markov Models

    Get PDF
    Nowadays, the availability of devices such as laptops and cell phones enables one to browse the web at any time and place. As a consequence, a company needs to have a website so as to maintain or increase customer loyalty and reach potential new customers. Besides, acting as a virtual point-of-sale, the company portal allows it to obtain insights on potential customers through clickstream data, web generated data that track users accesses and activities in websites. However, these data are not easy to handle as they are complex, unstructured and limited by lack of clear information about user intentions and goals. Clickstream data analysis is a suitable tool for managing the complexity of these datasets, obtaining a cleaned and processed sequential dataframe ready to identify and analyse patterns. Analysing clickstream data is important for companies as it enables them to under stand differences in web user behaviour while they explore websites, how they move from one page to another and what they select in order to define business strategies tar geting specific types of potential costumers. To obtain this level of insight it is pivotal to understand how to exploit hidden information related to clickstream data. This work presents the cleaning and pre-processing procedures for clickstream data which are needed to get a structured sequential dataset and analyses these sequences by the application of Mixture of discrete time Hidden Markov Models (MHMMs), a statisti cal tool suitable for clickstream data analysis and profile identification that has not been widely used in this context. Specifically, hidden Markov process accounts for a time varying latent variable to handle uncertainty and groups together observed states based on unknown similarity and entails identifying both the number of mixture components re lating to the subpopulations as well as the number of latent states for each latent Markov chain. However, the application of MHMMs requires the identification of both the number of components and states. Information Criteria (IC) are generally used for model selection in mixture hidden Markov models and, although their performance has been widely studied for mixture models and hidden Markov models, they have received little attention in the MHMM context. The most widely used criterion is BIC even if its performance for these models depends on factors such as the number of components and sequence length. Another class of model selection criteria is the Classification Criteria (CC). They were defined specifically for clustering purposes and rely on an entropy measure to account for separability between groups. These criteria are clearly the best option for our purpose, but their application as model selection tools for MHMMs requires the definition of a suitable entropy measure. In the light of these considerations, this work proposes a classification criterion based on an integrated classification likelihood approach for MHMMs that accounts for the two latent classes in the model: the subpopulations and the hidden states. This criterion is a modified ICL BIC, a classification criterion that was originally defined in the mixture model context and used in hidden Markov models. ICL BIC is a suitable score to identify the number of classes (components or states) and, thus, to extend it to MHMMs we de fined a joint entropy accounting for both a component-related entropy and a state-related conditional entropy. The thesis presents a Monte Carlo simulation study to compare selection criteria per formance, the results of which point out the limitations of the most commonly used infor mation criteria and demonstrate that the proposed criterion outperforms them in identify ing components and states, especially in short length sequences which are quite common in website accesses. The proposed selection criterion was applied to real clickstream data collected from the website of a Sicilian company operating in the hospitality sector. Data was modelled by an MHMM identifying clusters related to the browsing behaviour of web users which provided essential indications for developing new business strategies. This thesis is structured as follows: after an introduction on the main topics in Chapter 1, we present the clickstream data and their cleaning and pre-processing steps in Chapter 2; Chapter 3 illustrates the structure and estimation algorithms of mixture hidden Markov models; Chapter 4 presents a review of model selection criteria and the definition of the proposed ICL BIC for MHMMs; the real clickstream data analysis follows in Chapter 5

    Model selection procedure for mixture hidden Markov models

    No full text
    This paper proposes a model selection procedure to identify the number of clusters and hidden states in discrete Mixture Hidden Markov models (MHMMs). The model selection is based on a step-wise approach that uses, as score, information criteria and an entropy criterion. By means of a simulation study, we show that our procedure performs better than classical model selection methods in identifying the correct number of clusters and hidden states or an approximation of the

    Analysis of clickstream data with mixture hidden markov models

    No full text
    clickstream data sono un’importante fonte di informazioni per l’ecommerce, sebbene non siano semplici da gestire e convertire queste informazioni in un reale vantaggio competitivo non e un compito banale. In questo articolo, consid- ` eriamo l’applicazione dei mixture hidden Markov model a dati relativi al flusso di clickstream estratti dal portale e-commerce di un’azienda di servizi turistici. Sono stati individuati cluster relativi al comportamento di navigazione degli utenti e alla loro posizione geografica che forniscono indicazioni importanti per lo sviluppo di nuove strategie di business.Clickstream data is an important source of information for businesses, however it is not easy to manage this data and also to convert the information coming out from it in competitive advantage is not a trivial task. This study considers the application of mixture hidden Markov models to clickstream data extracted from a travel services company’s e-commerce portal. We find clusters related to web users’ browsing behaviour and geographical position that provide essential indications for developing new business strategies

    Sustainable tourism: Measures, evidence and future prospects.

    No full text
    The Chapter aims to provide a critical review of the quantitative approaches so far introduced in literature to measure the sustainable tourism. We focus on both variables and methods employed to obtain synthetic indicators of sustainable tourism and aim to provide some critical insights on potential new challenges caused by the current pandemic. For example, some dimensions of tourism sustainability may become more relevant in future due to the increasing attention by people on health and safety. Finally, a focus on the studies on measurement of sustainable tourism in the Mediterranean area will be provided
    corecore